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Abstract — Inspired from modern out-of-equilibrium statistical 
pliysics models, a matrix product based framework permits the 
formal definition of random vectors (and random time series) 
whose desired joint distributions are a priori prescribed. Its key 
feature consists of preserving the writing of the joint distribution 
as the simple product structure it has under independence, while 
inputing controlled dependencies amongst components: This is 
obtained by replacing the product of distributions by a product 
of matrices of distributions. The statistical properties stemming 
from this construction are studied theoretically: The landscape 
of the attainable dependence structure is thoroughly depicted 
and a stationarity condition for time series is notably obtained. 
The remapping of this framework onto that of Hidden Markov 
Models enables us to devise an efficient and accurate practi- 
cal synthesis procedure. A design procedure is also described 
permitting the tuning of model parameters to attain targeted 
properties. Pedagogical well-chosen examples of times series and 
multivariate vectors aim at illustrating the power and versatility 
of the proposed approach and at showing how targeted statistical 
properties can be actually prescribed. 

Index Terms — Random vectors, Time Series, Joint Distribution, 
A priori Prescription, Numerical Simulation, Matrix Product, 
Hidden Markov Model, Statistical Physics 

EDICS Category: SSP-SNMD SSP-NGAU SSP-NSSP 



L Introduction 

In modern signal processing, it is very often needed that 
synthetic data are produced numerically in fast and efficient 
manners with (some of) their statistical properties being a 
priori prescribed as closely as desired from selected targets 
(marginal distributions, covariance, spectrum, multivariate dis- 
tribution,.. .)■ This is for instance the case when the per- 
formance of newly developed statistical analysis need to be 
assessed. The theoretical derivation of such performance may 
turn too difficult to achieve, specially when it is intended to 
apply such tools to real-word data, whose properties are not 
well known. Instead, one can resort to Monte Carlo simula- 
tions: Performance are derived from averages over independent 
realizations of synthetic data, that are designed to resemble as 
closely as possible the real-world data. Another example stems 
from Bayesian estimation procedures, that generally involve 
Monte Carlo Markov Chain m or variational based resolution 
schemes (cf. e.g., ||2l, |[3l) to generate numerically independent 
copies of hyper-parameters drawn from (possibly compUcated) 
distributions, derived from the Bayesian formalism. 



For the numerical synthesis of Gaussian time series, with 
prescribed covariance function, the so-called Circulant Embe- 
ded Matrix synthesis procedure ||4l-||6l is nowadays consid- 
ered as the state-of-the-art solution and is currently widely 
used. It has then been extended to the synthesis of multi- 
variate Gaussian time series, with prescribed auto- and cross- 
covariance, notably in Q, ID (see also 191 for variations). 
The synthesis of non-Gaussian time series turns far more 
complicated as the prescription of the full joint distributions 
turns very difficult to achieve while that of the sole marginal 
distributions and covariance functions does not uniquely define 
the process. Several approaches were proposed to address such 
issues (cf. e.g., ifTOl . ifTTl and references therein for reviews). 
Other strategies are based on surrogate techniques: Starting 
from the original real-world data of interest, surrogate copies 
are obtained by randomizing one attribute of the data (e.g., 
the phase of its Fourier transform) while maintaining another 
fixed (e.g., the amplitude of the Fourier transform) (cf. e.g., 
|T2J). Recently, an optimal transport procedure was proposed 
aiming at iteratively modifying the joint distribution of random 
vectors or time series to attain a given target [[T3l . The general 
framework of Markov Chain simulation offers an alternative 
and broad class of solutions, focusing on the modelling of 
local dynamical properties, while not explicitly putting the 
emphasis on a direct prescription of the joint distributions of 
the process. Markov Chain schemes are also widely used to 
synthesize independent realizations of random vectors with 
prescribed properties (cf. e.g., 11141 for a review). 

The present contribution takes place in this long tradition 
of synthetic data design in signal processing, but is however 
rooted in a very different scientific field, that of statistical 
physics. Indeed, inspired from the exact solutions of stochastic 
out-of-equilibrium models describing particle diffusion on 
a one-dimensional lattice with volume exclusion, like the 
Asymmetric Simple Exclusion Processes (ASEP) ifTsl - lflTl . 
the design procedure proposed here is founded on the request 
that the joint distribution of the random vector or random times 
series X ^ = xi,X2, ■ ■ ■ ,xn must be written as a product 
P2£_ oc nfc=i Rd{xk), as in the case of independent compo- 
nents. However, the RdS are no longer univariate distributions, 
but instead d dimensional matrices of valid unnormalized 
distributions. In statistical physics, this matrix product form 
of the joint distribution was envisaged as a practical ansatz 
to find exact solutions of the master equation associated e.g. 



to the ASEP model. In signal processing, this enables us 
to devise a theoretically powerful and practically versatile 
and efficient procedure to define and generate numerically 
random vectors or random times series, with a priori prescribed 
statistical properties. A preliminary and partial attempt relying 
on specific choices of matrices and focusing on stationary time 
series only was presented in IfTSl . It is here extended to a more 
general and versatile setting. 

Definitions of this matrix product joint distribution frame- 
work and the general consequences in terms of statistical prop- 
erties are derived in Section |II] Notably, a general condition to 
ensure stationarity of the time series is obtained. In Section HIH 
the dependence structures that can be achieved from this model 
are studied in-depth and the practical tuning of the time-scales 
that can be input in these dependence structures is established 
in details. In Section |IV| it is first shown how this matrix 
product framework can be explicitly recast into that of Hidden 
Markov Models, hence providing us with a fast and efficient 
practical synthesis procedure; and it is, second, explained how 
targeted marginal distributions and other statistical properties 
can be attained. Section [V] consists of two different sets 
of pedagogical examples aiming at illustrating the efficiency 
and versatility of the tool: First, sets of time series whose 
marginal distributions, covariance functions and some higher 
order statistics are jointly prescribed, are produced; Second, 
sets of random vectors, with different marginal distributions 
are synthesized and it is shown how to tune the correlation 
and fourth order statistics amongst the components of such 
vectors while maintaining fixed the marginal distributions. 

II. Joint probability as matrix product 
A. Definitions and general setting 

Inspired from out-of-equilibrium statistical physics models 
(cf. e.g. ifTSl - lflTl ). a general framework for the design of 
joint distribution functions for random vectors ^^, of size A^ 
is devised. It is based on product of matrices and relies on 4 
different key ingredients defined below. 

First, let rf G N*. Second, let A denote a fixed non-zero 
non-random dx d matrix, with positive entries, and C{M) an 
associated linear form defined (for any matrix M) as: 

C{M) = tr {A^M) . (1) 

Third, let £ denote a fixed non-zero non-random dx d matrix, 
with positive entries £ij. For reasons made clear in Section Hill 
£ will be here referred to as the structure matrix. Fourth, 
let Matrix V consists of a d x d set of valid (normalized) 
distribution functions {'Pij{x)}i=i d:j=i....,d- 



Rd{x) =£®V{x), 



(2) 



where ® denotes entry-wise matrix multiplication. Let X = 
{X„}i<„<jv denote the random vector explicitly defined via 
its joint distribution: 



^x[,xi, 



.,Xm) 



'^{ntiM^u) 



C{£'^) 



(3) 



rJV 



where Y\k=i^d{xk) — Rdi^i) ■ ■ ■ Rdi^N) denotes the ori- 
ented product (i.e., the order of the factors is fixed). The joint 



choice of positive entries for matrices A and £, and of vaUd 
distribution functions in V is sufficient to ensure that Eq. (O 
defines a valid joint distribution function. 



B. Marginal distributions, moments and dependence 

From these definitions, using commutativity of integration 
and matrix product, a number of statistical properties of X m 
can be analytically derived. 

Univariate or marginal distributions take explicit forms: 



Vk{Xk=Xk) 



C (/ Rd{xi)dxi . . . Rdjxk) ■ ■ ■ RdixN)dxN) 

Ci£^) 
C(£'^-'Rd{xk)£''-'') 



C{£^) 



(4) 



This shows that the marginal distributions necessarily consist 
of weighted linear combinations of the Vi.j, 



:,{Xk = x) = y^c^j,fc7^i,j(a 



(5) 



»j 



where the Cij^k are functions of A and £, satisfying, for all 

Further, let us define the collection of matrices, M{q) ~ 
j^x^Rd{x)dx = £ (E) j^x'^V{x)dx, whose entries read: 



A/(q)„. =£,j / x'^V^,J{x)dx. 

Univariate moments also take closed-form expressions: 

C{£''-^M{q)£^-'') 



K] 



C{£ 



N\ 



(6) 



(7) 



The p-variate distributions (with fci < • • • < kp) can also be 
made explicit. 



P(Afc, =Xk,,...,Xk^=XkJ = 

C (f^i-i (uVi i?d(2;fcjf ^"-+^-^'-1) Rd{xk^)£''-'' 



C{£^) 



(8) 



as well as the p-variate moments (with q,. the order associated 
to entry /ov): 



E 



n^ 



C (f^i-l (YZZ\M{qr)£^^^'-''--^) M{qp)£«-^^ 



(9) 



C{£^) 



Eq. (|9]l constitutes a key result with respect to applications as 
it clearly shows that the joint statistics of order q of X_j^ can 
be prescribed by the sole selection of matrices M{q). This 
will be explicitly used in Section |V] 



C. Stationarity condition 

Eq. ^ shows that the in general non-stationary nature of 
X w stems from the non-commutativity of the matrix product. 
To enforce stationarity in ^^, a commutativity property for 
A and £ must be added, 

[A^, £] = A^£ - £A^ = 0, (10) 

which ensures that Vx, C {£Rd{x)) = C {Rd{x)£). 

Under Eq. ( fTOl i. the marginal distribution (cf. Eq. (|5}) 
simplify to: 

C{Rd{x)£^-'] 



?(^ = a^) = 



C{£ 



N\ 



^ 2_^ '^^•3 ' i'Jy'' 



(11) 



and p-variate distributions and moments become (cf. Eq. (|8) 
and Eq. Q) 

P{Xk, = Xfei , . . . , Xfep = Xfep) = 

c((irrZlRdixk.)£'"'+'-'"'-')Rd{xkJ£''-^''^-'''^-' 




C{£ 



N\ 



(12) 



-C ((n^ll A/(q,)£:^'-+^-^'-i) M(gp)£-^-('^>-'^-i)-i) 



C{£^) 



(13) 



These relations clearly indicate that the vector {Xn}i<n<N 
can now be regarded as a stationary time series: All joint 
statistics depend only on time differences, fc^+i — fcr- 

The sole matrix A satisfying Eq. (fTol i for all matrices £, 
is the identity matrix, in which case C consists of the trace 
operator However, the trace operator also induces automat- 
ically a circular correlation structure, I > 2k, MXkXi = 
KXkXiy+2k-i, a highly undesirable consequence for appli- 
cation purposes. 

Alternatively, stationarity can be obtained by choosing 
jointly specific pairs {A,£), such as 

A.J = (l/d) (14) 

and £ a so-called doubly stochastic matrix, defined as: 

Vi,j, ^£-fcj =^£,,fc = l. (15) 

k k 

Indeed, such choices yield 

Vfc, M, C {M£'') = C (M) , 

that leads to further simplifications of Eq. (flU to Eq. (fljT l. The 
marginal and partial distributions of X_j^ no longer explicitly 
depend on the sample size N, also the marginal distribution 
is independent of £: 

Fk{Xk = Xk)=C{Rd{xk)), (16) 

P(Xfei = Xfej , . . . , Xk^ = aifcp ) = 

I (17) 



/p-i 



jO( (l[Rd{xk^)£'''-+'-''''~' ] Rdi^kJ 



E 



n^ 



clll[AI{qr)£'' 



-kr-1 



Miqp) 



(18) 

Elaborating on a preliminary work (cf. llTSi ). this particular 
setting will be used in Section IV-AI to design efficiently 
stationary time series. 

III. Dependence structure 

Eqs. (O and ^ indicate that Matrix £ essentially controls 
the dependence structure within Vector 2Ln^ hence its name, 
via the collection of its powers £", n — l,...,N, while 
the matrices M{q) fix the amplitudes of the dependencies 
at order q. Analyzing the forms possibly taken by the f " is 
hence crucial to understand the potential dependence structures 
of X M achievable in this framework. Notably, the classical 
distinction between diagonalizable and non-diagonalizable £ 
plays a crucial role. This is studied in detail in this section for 
the correlation structure. 

A. Diagonalizable structure matrix 

Let us assume that £ is diagonalizable, with eigenvalues 
Ai, . . . , Ar. Then, £ can be decomposed into r sub-matrices 
El,. . . ,Er such that 



4=1 



£'' = 



and Eq. ^ can be rewritten as: 



MXk 



J2z,j \ 



^'xf~''C{E,Rd{x)E,) 



ELi^f^iE,) 



(19) 



(20) 



which explicits the dependence in k and is reminiscent of 
Eq. (|5]l given that the C {EiRd{x)Ej) consist of linear com- 
binations of the Vi.m{x). Further, for the 2-sample statistics, 
or covariance function, Eq. (|9) simplifies to: 

^"-^Xf-'X'^'-'C {E,M{\) E„,M{1) Ej) 



E [XkXi_ 



E. 



A," 



ELi Af /: {E. 



(21) 



Assuming stationarity, i.e., Eq. (fTOl i. the relation above further 
reduces to: 

E,.rn Af -' (^) '"'"' C (M(l) E^Mil) E,) 



EliK^iE,) 



(22) 



which shows that the covariance function consists of the sum 
of weighted exponential functions exp(— (fc — I — l)(ln Aj — 
In Am)), with at most rM{rM — l)/2 = C"") characteristic 



time scales, r,. 



(InjAjl — In I Am I) , where vm stands 



for the number of eigenvalues of Matrix £ with different 
modulus. A preliminary study of the covariance function in 
the stationary and diagonalizable case has been devised in ifTsl 
and an example is worked out in Section IV-All 

Without assuming stationarity, this exponential decrease of 
the covariance function still holds. Indeed, let us assume that 
C{Ei) 7^ and that the norm of Ai is strictly larger than 
the norm of the other eigenvalues. Then, the normalisation 
term C {£'^) can be approximated in the limit N -^ +cx) as 



£ (£^) ^ X^C{Ei). Combining this asymptotic form with 
Eq. (|4|i yields: 






'J 



C{£^) 



£{EiRd{x)E,) 



(23) 



Ai£(^i) ' 
and combining this result with Eq. (ISTT i leads to: 



Ai 



C{Ei)Xi 



These equations show that using a diagonalizable £, with 
a dominant eigenvalue, implies that asymptotically, i.e., in 
the limit N — > +cxj, each component of Vector X j^r shares 
the same univariate distribution, and that the autocovariance 
functions reads as a sum of exponential functions, depending 
only on |fc — ^|. In the limit N — > +cx). Vector 2Ln i^ hence 
asymptotically quasi-stationary. 

B. Non-diagonalizable structure matrices 

Non-diagonalizable matrices £ draw a very different land- 
scape. For illustration, let us consider the case where £ = 
Id + H with Id is the Identity matrix and H any nilpotent 
matrix of order p + I (i.e., H''^'^ = while H'' ^ 0, 
when 1 < k < p), chosen as a standard example of non 
diagonalizable matrix. Then, for any k > p, one has: 



3=0 



H\ 



which combined with Eq. (|4]i yields 

FfX - , . - ^H^Zl^im^i^:^!!^)^ ,25) 

MX, - X,) Y.u{^)cm ''^ 

and, for the covariance function: 

(26) 
To gain a better grasp of what these equations imply, let us 
study their asymptotic behaviors. Using 



N\ NP 



-,iV- 



and the assumption that C (HP) ^ lead to 

NP 
C{£^) ^--C{HP) 
^ ' p'. 

and, with the assumption that k diverges with N, to 
p .y . ^ fk-l\fN-k\C{lPRdix)IPl 



E 



P 



J 

kV C{WRd{x)W) 
n) C{Hp) 



Compared to Eq. (|20] |. this indicates that the marginal distribu- 
tion of component k reads as a mixture of distributions, with 
weights depending on the relative position k/N, rather than 
the absolute position k, as is the case for diagonalizable £. 

The covariance function also turns quite different from the 
diagonalizable case, 

E [XkXi] « Y. 

-i+j + TTl— p 

p\ I k_V fk_l_Y f _ J_y" C{H'MiWMiH"'y 
ilm\j\ \n) V ^ / V " ^/ C{HP) 

(28)' 



with the occurrence of algebraic terms, (-^jy^) ' ^^^^ indicate 
long-range correlations developing along the whole vector 

This specific choice for non diagonalizable £ enables us to 
figure out that, combining a block diagonal Jordan reduction 
of £ and results obtained in the diagonalizable case, the 
covariance function consists, in the very general case, of a 
weighted sum of algebraic and exponential decreases. 



IV. Design and Synthesis 

The elegant and compact definition of X^ via its joint 
distribution F(X j^r) in Eq. (O gives little hints on how to 
construct random vectors with desired prescribed properties or 
on how to synthesize them numerically. The present section 
addresses such issues: First, given that d, A, £ and V are 
chosen, Eq. (|3]l is reformulated into the framework of Hidden 
Markov Models and the corresponding numerical synthesis 
algorithm is devised; Second, given d, A, £, an algorithm for 
constructing a Matrix V with prescribed marginal distributions 
and dependence structures is detailed. 



A. Hidden Markov Chain 

First, generalizing the fact that the entries of the matrix 
product (ABC) reads {ABC)ij = J2k i'^i,kf'k.ici,j, enables 
us to recast Eq. (O into: 



N 



(Xl 



, • • ■ ,xn) = E ^® n ^(2^fc)r.-i,r„ (29) 



fc=i 



withr = {ro,...,rfc,...,rjv}e [i,...,rf]^+i and 



N 



<!.) 



Ci£-) l\ 



ll^rfc_i,rfc, 



(30) 



(27) 



where ^p k{T) — 1. Therefore, «;(£) and P(X) can be 
read respectively as the probability function of T and as a 
'*(£) -weighted mixture of laws, each defined as the product 

Second, let Fx denote the set of chains starting at index t 
and stopping at index A^ — 1: r>t = (Ft, . . . ,Fjv-i)- For a 




Fig. 1. Transition graph. Example for d = 6 and £. = aoI^+aiJ^-\-a2J^ 
(ao +ai +02 = 1). 



given pair (rojFjv), Eq. (|30] | shows that: 

p(rfc=.?|ro = 7o,...rfc-i-7fe-i) 

^70,71 ■ • • ^Ik-iyj Z^T>k+i J'^fc + i • • • i-rjv-i.Tjv 



•^70, 71 ■ • • ^7fc-2,7fc-l Z.^r>fc ^Ik-lJ^k 



{£^-k) 



j.r. 



lk-1,0 rcN-k+l\ 



'''{£ 



= p(rfc=j|rfe_i = 7fe_i) 



''7fc-i.rN 



(31) 



and hence that F consists of an inhomogeneous d-state Markov 
chain, with transition probability matrix at step k reading: 



{£^-^) 



p(r,=,|r,_i = *) = £,, ^^^^3^ 



3,T^ 



(32) 



This shows that £ can now be interpreted as (the basis for) the 
transition matrix underlying the Markov chain F, as illustrated 
in Fig. [T] 

Third, Eq. dSOl l enables us to show that the initial distribution 

for (FojFat) reads: 



^(Fo = I, Fjv = .?) = 



C{£ 



N\ 



(33) 



Combined together, these three steps enable us to synthesize 
numerically Vector X_f., as defined from Eq. (|2]i, using a 
Hidden Markov Model, with 4 hidden states: the current state 
Ffc, the previous state Vk-i and the pair formed by the initial 
and final states (FqjFtv). Combining Eqs. ( |32] | and ( l33T l, the 
synthesis algorithm can be sketched as follows: 

Step 1 : Initialization: 

Use Eq. (l33T l to generate the states Fq and Fjy. 
Step2: Iteration on k = 1, . . . ,N: 

Step2.1: Choose at random state F^., according to the tran- 
sition probability given in Eq. (l32] i: 
Step2.2: Generate Xk according to 7'rfc_i,rfc- 

Under the particular choices Aij = 1/d (cf. Eq. ( fT4l i) and 
£ doubly stochastic (cf. Eq. (fTSb). it has been shown that there 
is no need to set a priori the final state F^v and that the Markov 
chain becomes homogeneous, hence that Eqs. (l32b and ( |33] | 
can be rewritten as (cf. ifTSI ): 



P(Fo 



1 
d' 



(35) 



P(Ffe=j|Ffc_i 



£ij and 



(34) 



Such simplifications show that £ exactly defines the transition 
matrix for the Markov chain F and, moreover, that the initial 
state of the Markov chain Fq follows a uniform distribution. 
The computational cost of the synthesis procedure can be 
evaluated as follows. For the general case where £ is not 
an invertible matrix, the algorithmic complexity is dominated 
by the cost of the computation of matrix powers, and hence 
a total complexity in 0{d^N\nN) (with a 0{d^) matrix 
multiplication cost and a ©(In A^) matrix power computation 
cost). When £ is invertible, the global cost reduces to 0{d^N) 
(as matrix multiplications only are required). With the choice 
of doubly stochastic matrices £ and Aij — 1/d, the compu- 
tational cost is further reduced to 0{N\nd). Therefore, the 
synthesis procedure scales efficiently for large signal (large 
N) and large complex dependence structure (large d) (with 
numerous further potential optimizations for specific cases 
such as sparse structure matrix). 

B. Design 

Let us now address the issue of how to select the elements 
constitutive of the model, given a targeted Xy . 

Prescribing directly a targeted joint distribution function 
consists of a difficult theoretical problem, that also often does 
not match application purposes. Instead, it is often natural to 
split this general question into two simpler sub-problems: on 
one hand, fixing the dependence (or covariance) structure; on 
the other hand, fixing the marginal distribution(s). Sections Ull 
and Hn] indicate that ('^2^) defines the (maximal) number of 
time-scales involved in the dependence (covariance) structure 
function, while the joint choice of A and £ controls the 
shape of the dependence (e.g., stationarity, cross-covariance, 
...). Examples of construction enabling us to reach targeted 
dependencies are detailed in Section |V] 

Let us for now assume that d, A, and £ are chosen (hence 
that the dependence structure is fixed) and let us concentrate 
here on designing Matrix V, so as to reach targeted marginal 
distributions. Also, for the sake of simplicity, let us concentrate 
on the stationary case first. Extensions to the non-stationary 
case will then be discussed. 

It is first worth emphasizing that prescribing marginals and 
dependence cannot be treated as fully independent problems 
(only the global structure of the dependence is not related 
to the choice of Matrix V). Indeed choosing V fixes both 
univariate distributions (according to Eq. (|5)) and the coef- 
ficients governing the relative amplitude of the dependence 
structure, through the matrices AI{q) (cf. Eq. (|9]l). Moreover, 
the prescribed marginal imposes some constraints on the 
matrix M{q). For instance, Eq. (fTTT i implies that: 

M{l)^^j < f,jE [X\X >x], x = Fg^l - c,,j), (36) 

with X generated according to P5 (as defined in Eq. (fTTT)). 
and where Fs denotes the cumulative function associated 
to the marginal distribution Pg. The difficulty thus consists 
of devising a constructive procedure for Matrix V enabling 
practitioners to reach jointly the targeted marginal distribution 



P5 (according to Eq. (fTTT i) and targeted Matrices M{q) (cf. 
Eq. (|9]l). To disentangle both groups of constraints, the Vij 
are parametrized using a matrix of (strictly) positive functions 



dtjix)- 



C'^^j-P^A^) = ^^^^"^I^^six). 



(37) 



Ei,m.9;,m(a;) 

With this parametrization, Eq. (fTTT i is automatically satisfied. 
To ensure that the Vi.j are probability distributions, it is needed 



that 



?^j(^) ^.(x)dx = c,,,. 



Y.l.m9l..m{x) 



(38) 



Splitting gi,j into gij{x) = iJ,ijhij{x) permits to use the free 
parameters fiij to solve Eq. ( |38] ). for any fixed h = (hij) : 

fJ'[h]i,jhij{x) 



E;,mMW/,m/»i,m(a 



-Ps{x)dx — Ci, 



(39) 



Plugging the solution (//[/iji.j) of Eq. ( |39] l into Eq. dJTl ) yields: 
The last step is to find h such that: 



mr.KA^) „^^^y (40) 



^(9), 



£.. 



x'^'P[h]ij{x)dx 



(41) 



where Af (g) are the prescribed moment matrices. Essentially, 
this parametrization of V in terms of hij{x) enables us to 
recast Eq. (|6]l as a non linear equation in hij (x) (cf. Eq. (|4TI)). 
where hi,j{x) are strictly positive functions. To practically 
find solutions to Eq. (HTt . the functions hi,j{x) are further 
constrained to be chosen amongst an arbitrary parametrized 
kernel family A'^: hij (x) ~ Kp. , [x). A natural choice for the 
kernel parameters p would be to consider vectorial parameters 
with a dimension equal to the number of targeted moments 
orders. For instance, when the targeted P5 is a normal law 
and A/(l) and M{2) are fixed a priori, it is natural (but 
not mandatory) to select K„ia = ■^m,a- Further examples 
are given in Section |V] Within this framework, Eqs. (l37t 
to ( l4n i can be solved numerically step-by-step. Examples of 
results obtained with the constructive method are detailed in 
Section |V] and illustrated in Fig. |4] 

In summary, to design Vector 2Ln with prescribed univariate 
distributions and dependence structure, the following proce- 
dure is to be followed: 

1) Select d, £ and A in agreement with the targeted 
dependence structure. 

2) Choose the moment matrices M{q) to determine the 
dependence coefficients. 

3) Choose a kernel family Kp. 

4) Solve Eqs. ( |37] | to ( |4TI ) to compute the distribution 
matrix V in agreement with the targeted P^s. 

A Matlab implementation of this procedure is available upon 
request. The potential of this method is further explored in 
Section [V] where simple (hence pedagogical) examples are 
devised and discussed. 

For the general non-stationary case, the above algorithm 
cannot be used as is and a general solution is still under 
construction. However, for the restriction of the non-stationary 



case to the fairly broad and practically efficient case of 
block structured matrices, detailed in Section IV-BI the above 
algorithm can straightforwardly be applied independently for 
each component of the targeted random vector As illustrated 
in Section IV-BI both the block structured matrix framework 
and the above algorithm provide practitioners with a rather 
general tool to define and synthesize numerically a large 
number of realizations of random vectors. 

V. Illustrations 

This sections aims at illustrating the potential of the general 
construction above in two specific contexts: First, we concen- 
trate on stationary vectors 2Ln^ i-^-' 011 stationary time series 
synthesis; Second, we illustrate for trivariate random vectors 
how to vary the joint distribution functions while maintaining 
fixed the three different marginal distributions. 

A. Stationary time series 

1) Circular structure matrix: To design stationary time 
series, elaborating on a preliminary work (cf. lITSl ). we now 
select Aij = (1/d) and a particularly fruitful choice of doubly 
stochastic matrices: 



d-l 



d-1 



k=o k=a 

where Jd G Md{^) is defined as: 

/O 1 • 



(42) 



Jd 




Vl 



o\ 



1 

0/ 



and J^ = Id, the Identity matrix. It is easy to verify that 
[.4^, Jd] = 0. Therefore, [^-^,£] = and 2Ln i^ stationary. 
Moreover, the eigenvalues of Jd are easily computed as 
functions of the roots of order d of the unity, u = cxp(2i7r/(i), 
which motivates the use of this matrix: 



A,, 
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The eigenvectors of Jd also have simple expressions: 

From these definitions, it is useful to change of basis: 

/O ... 0\ 



(43) 
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A = BAB- 



E = B-^£B = 



Vo 

Ai 



(45) 



Vo 

M{q) =B-^M{q)B 



Such calculations lead to a simple expression of the depen- 
dencies: 



nxf^T] 



E ''-' 



XrM{qi)daM{<l2kd 



(46) 



For ease of notations, let us define the two vectors {Cj^Z[, ))k = 

J2i^{q)i,k, {RM(q))k = J2jM{q)k^j and let T denote the 
(normalized) discrete Fourier transform : 
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"' l-l 


ITvkl 
d 


I 


This leads to: 






E [X;^X?] - E [X^] E [X?] = 






ld/2] 

X > : 




_ t-l .2,r(f-l) 



(47) 



where [zj stands for the integer part of z, nik = 1 if 
2k = d and irik = 2 otherwise, and ?le {} denotes the real 
part. In this specific case, there is thus at most [d/2\ + 1 
distinct time scales in the signal. As a pedagogical example, 
choosing £ = a^I + aiJd (where ao + ai = 1) enables 
to further study the form of the covariance function. The 
eigenvalues of £ read Afc — an + aie~^ and can be 
rewritten as Afc = e ^^ e ^^ , A: = 1, . . . [d/2j. Combining 
these results with Eq. ( |47] | shows that r^. = — l/ln|Afe| and 
Tk = 27r/arg(Afc) are characteristic dependence time scales 
and periods that depend on the joint choice of d and ao. It 
can also be shown that r^. ~q(,qi^o [q^och (l — cos ^f^)] 
Moreover, increasing d increases both the number of distinct 
dependence time scales and the ratio of the smallest to the 
largest such characteristic time scales, which can be shown to 
vary asymptotically (apcn — > 0) as ((i/27r)^/2. 

2) Example: To illustrate the potential of the proposed 
time series theoretical construction and synthesis procedure, 
a pedagogical example is devised. First, it is intended to 
design a stationary Gaussian time series X of size N, with 
a Gaussian marginal distribution P{x) = Ao^i, an auto- 
covariance function that consists of a 5 function (i.e., no 
correlation and M{1) = 0), but dependence as fixed here 
by the 4— th order statistics, i.e., by prescribing M(2). By 
definition, X is hence not a jointly Gaussian process (i.e., its 
joint distribution departs from a multivariate Gaussian law). 
Second, it is intended, using the same d, A and £, to design 
a second time series Y, independent of X, but with same size 
N, same marginal distribution and autocovariance function but 
different dependence (via a different M(2) matrix) and hence 
a joint distribution different from that of X. 

For these constructions, d = 6 is selected to obtain 3 distinct 
time scales and £ = aoId + aiJd (ao+ai = 1) for the sake of 
simplicity. With these choices, the three available time scales 
read: 



Tl 



, T2 

>o ai 



T^^^3 ^^7^- (48) 



2 1 

T3 ~ 

Qi-i-i) «! Qi-i-0 3ai Qi->o 2ai 

The two time series X and Y differ only in the autoco- 
variance of their squares (E [X^X^] - ^ [2^1] E [X^])- This 
implies that the matrices A/(2) of the two signals differ. An 




Fig. 2. Numerical Synthesis. Two different time series witli tlie same 
marginal distribution (mixture of two Gaussians, cri = 0.5, o"2 = 2), the same 
covariance functions (chosen as 5 functions) but different covaiiance functions 
for their squares, hence different prescribed joint distributions (qq = 0.98). 
Left side X, right side Y_. First hne, one realization of the time seiies. Then, 
from top to bottom, estimated (solid black lines) and theoretical (dashed 
colored lines) marginals, correlation functions and correlation functions for 
the squared time series. 



interesting simplification at this step is to choose the matrices 
A/(2) to be of the form M(2) =, aoD + aiDJd where D is 
diagonal. In this case Eq. ( |47] i further simplifies to: 



E[X^,xn-E[X^,]E[X?] = 
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Y, mfe|^(diagp))fe|2fRc|(ao + a 
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(49) 



where diag{D)i = Dii is the diagonal vector of D. Therefore, 
choosing for X, 



D,,_,, = 



a I 1 even 
(7-2 otherwise 



ai + (T2 = 1 (50) 



induces that T3 only, i.e., the shortest time scale, appears in 
the auto-covariance of X^: 

E [X'.Xf] - E [X^] E [Xf] = ^-^^-^{ao - a.f . (51) 



Conversely, imposing for Y 

I (Tl I < 3 



D, 



02 J > 3 



O"! + 0-2 = 1 



(52) 



leads to the fact that both ri and T3 contribute to the autoco- 
variance of Y^: 
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ir^i = 



36 






(53) 

However, ti « 4t3 is dominant at large t and hence constitutes 
the leading term. 

With these pedagogical and documented choices of A/(2) 
for X and Y, we can use the numerical procedure devised in 
Section HV-BI with a Gaussian kernel Krn.a{x) ~ exp(— (a; — 
m)^/(2cr^)), to compute the associated distributions Vij. The 
hidden Markov chain procedure described in Section IIV-AI is 
then used to synthesize the signals X and Y . The analysis 
of these synthesized signals produces the results reported 
in Fig. [5] This figure presents for X (left column) and Y 
(right column) a particular realization of the time series, 
the estimated and targeted univariate distributions, covariance 
functions and covariance functions for the squared time series 
(from top to bottom). It clearly shows that X and Y have the 
same marginal and covariance but different joint distributions 
(as targeted). Though sharing the same pedagogical goal, the 
examples devised here significantly differ from those presented 
in ifTSl as both the targeted marginal and the constructive 
Kernel differ 

Using the same construction procedure, other examples with 
the same (non necessarily Gaussian) marginals, same (non 
necessarily 5-) autocovariance functions but different joint 
distributions could as easily be devised and are available upon 
request. 

B. Random vector 

1) Multivariate design: Let us now consider the design of 
a random vector whose components have different univariate 
distributions and are dependent. As seen in Section |III] for a 
fixed d, when N increases, X^ tends to become stationary. 
A simple way around this drawback is to increase the size of 
Rd with the size of X m. by choosing d = {N + l)d*, while 
keeping a block triangular superior structure, to avoid a too 
large increase in the number of degrees of freedom: 



/O,. 4\\x) 



Rd{x) = 
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VOrf. 



where 0^. is a 0-block of size d* x d* and wJ are d* x d 
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matrices, as defined in Eq. (|2]i: 



+ 00 



j*k) 



V)j\x)dx^l. 



Let us also choose the projection matrice A such that 

/Od- ... OdA 

A=\ ■■ : , (55) 

\A* ... OdJ 



where A* is a positive entry matrix of size d* x d*. If one 
further defines C* (M) = tr (^A*'^ M) (as a Unear form on 
d* X d* matrices), Eq. Q becomes: 
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By further restricting to the case fC**^) = £*, the joint 
distribution simplifies to: 



[Xl, 
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(56) 



The joint distribution in Eq. (|56] l consists of a variation on 
Eq. (|3]l, where the constant probability matrix V has been 
replaced by a varying probability matrix pf**^). The general 
formulation in Eq. ( l54l i and Eq. (|55] | shows that this particular 
setting is nothing but a convenient notation that however 
corresponds to a subcase of the general framework: Therefore 
all results developed in Sections |II] to |IV] remain valid. 

It is hence straightforward to derive the corresponding 
expressions for the univariate distributions, the p-samples 
probability distributions and moments: 
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¥{Xk^ = Xfci , . . . , Xfep = Xfep ) 
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(59) 



with M^**^) ((?) = J_^ xtR^^J {x)dx. Using this particular 
structure for the matrix Rd permits to define each component 
of the random vector X^ with a relative independence. 
Notably, the univariate distribution of X^ depends only on 
•p(*fc) Moreover, with these matrices A and Rd, the hidden 
Markov chain F starts in the first upper diagonal block of size 
d* and must end in the last upper diagonal block. In other 
words, at each step, the hidden Markov chain goes from the 
/s*'* block to the (/c + l)*'* block using the transition i,j which 
corresponds to a probability law belonging to Rd. ■ 

2) Trivariate Example: Let us now develop the construction 
of two different trivariate random vectors X_j^ = {Xi, X2 , X3 ) 
and y^ — {Yi,Y2,Y3), with marginal distributions set to 
a Gaussian distribution A/o.i, for Xi and Yi, to a Gamma 
distribution, with shape parameter a = 2 and scale parameter 
(3—1 for X2 and Y2, and to a Gamma distribution with 




Fig. 3. Bivaiiate probability distributions of 2Ln (Left) and y^y (Middle) and Zjy(Right), for Pairs 1-2 (top). Pairs 2-3 (middle). Pairs 1-3 (bottom). For 
each vector: Left: p = 0.1. Right: p = 0.8. 






Fig. 4. Marginal distributions P^ (solid lines) for 2Ln (left), Y_j^ (middle) 
and ZjY (right) and the designed Cijf^Vi.j (dashed lines); for Components 
X\, Yi, Z\ (top); Components X2, Y2, -^2 (middle); Component X3, Y>,, 
Zz (bottom). 



a = 1 and f3 ~ 2 for X^ and Yjj. To illustrate the potential 
of the tool, 2Ln ^nd Htv ^Iso have the same correlations, but 
different joint distributions. 

First, d* ^ 2, £* = a^Id + ai Jd and A*j = i^/d*) are 
selected. Second, to control correlations, the moment matrix 
M (*'') (1) is set, for both vectors, to (k = 1, 2, 3): 



£* 



\mk,2 TOfc,2 



M^**^' (1) 
with the constraints: 

TOfc,l +TOfc,2 = 2E[Xfe] . 

Defining Aj. = m.fc,i ^ "T-fc,2- the covariance reads 

Cov[Xi,X2] - Cov[yi,y2] - (1 - 2ao)AiA2, 
Cov [X2, X3] = Cov [^2,^3] = (1 - 2ao)A2A3, 
Cov [Xi,X3] = Cov [Y^Ys] = (1 - 2ao)'AiA3. 



(60) 



Therefore, the covariance for any two consecutive components 
depends linearly on ag, and, when A^A; > is maximum 
for ao = 1, vanishes at ao ~ 0.5 and is minimum for ao = 0. 
The two trivariate joint distributions can now be made 
different via their moment matrices, A/'**^' (g), of order q = 2, 
is set to: 



which for X m 



Af (*i) (2) = £* 
Af (*2) (2) = £* 



and for F^ to: 



Af (*i) (2) = £* 



\ ^\ , Af (*3) (2) ^ £* 
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(61) 



A/(*3)(2) =£* 



2.75 
9.25 



2.75 
9.25 



8 8 
8 8 



(62) 



*k) 



To construct the V''*'^\.j from the procedure developed 
in Section HV-BI a Gaussian kernel Km,a{x) = exp(— (x — 
r?T,)^/(2(T^)) is chosen. The resulting probability matrices 
p(*k) jj^yg Qjjjy 2 distinct components. Fig. |4] illustrates how 
the weighted density Ci,j^k are combined in order to obtain the 
marginal distribution P^ for X ^j (left) and F^ (middle). 

Furthermore, changing the Kernel K parametrizing VI 
also constitutes an efficient way to further vary the joint 
distributions, hence introducing further versatility in the pro- 
cedure. For example, K could be chosen as the Gamma 
distribution family, or the union of the Gaussian and Gamma 
families. To illustrate this, let us now construct a third trivariate 
random vector Z_^, sharing the same marginals and covariance 
X M and y^ (it actually shares exactly the same matrices 
Af^**^) (1) and M^*^'^ (2) as those of X^), though obtained 
from Kernel Km a{x) = (0.1 + ((x — m)/cr)^) exp(-((a; - 
m)/an 

Bivariate partials distributions (rather than trivariate joint 
distributions, for clarity) are shown in Fig. |3] for the pairs 
(Xi.Xz) (top row), (X2,X3) (middle row) and (XijXa) 
(bottom row) for X_^ (left block), F^ (middle block) and 



10 



ZjY (right block), for negative (left column in each block) 
and positive (right column) correlations. These plots clearly 
show that, for any two pairs, the bivariate distributions (hence 
a fortiori the joint trivariate distributions) are different for the 
three vectors, though their components have same univariate 
marginals and same correlations. Furthermore, Fig. 2] shows 
how the targeted marginal distributions P^ are obtained by 
summation of the Cij_k'Pl which were produced by the 
algorithmic procedure described in Section IIV-BI Note that 
with the chosen setting of the proposed examples, the sum- 
mation has been a priori limited (for simplicity) to only 2 
different T', , , while rf* x d* = 4 distinct distributions would 
have been available in general. Interestingly, comparisons for 
a given component (i.e., for a given row) amongst the three 
vectors illustrates that a same marginal distribution P^ is 

(*k) 

attained from different summation of the Ci , tT^,- , for the 
three vectors, which constitutes a signature of the fact that the 
joint distributions of X^, I^at ^nd Z^y are different. 

Here, the example was chosen trivariate, as a trade-off 
between ease of exposition (3 partial distributions of pairs of 
components remain easy to display) and demonstration of the 
potential and richness of the tool. Multivariate examples are 
however just as easy to construct. 



VI. Conclusion and Perspectives 

A general framework for the theoretical definition and the 
practical numerical synthesis of random vectors and random 
time series, with a priori prescribed statistical properties, has 
been fully worked out, based on a matrix product formal- 
ism, and inspired from out-of-equilibrium statistical physics 
models. Its ability to shape jointly marginal distributions and 
the dependence structure has been studied both theoretically 
and practically. Pedagogical examples illustrated the versatility 
and richness of the procedure in actually attaining targeted 
properties. Remapping this matrix product framework onto 
that of Hidden Markov Models enabled us to devise an 
efficient practical numerical synthesis algorithm. Also, a de- 
sign procedure enabling to tune the elements of the models 
so as to reach desired targets has been obtained. Both for 
design and numerical synthesis, Matlab implementation of 
the described procedures are available upon request. 

Comparisons of the proposed approach to other numer- 
ical synthesis frameworks in terms of potential, versatility, 
efficiency, precision and implementation are under current 
investigations but are beyond the scope of the present con- 
tribution. Further, the benefits of using other matrices A and 
£ will be explored. Moreover, having maintained the writing 
of the multivariate distributions as a product, as is the case 
for independent components, leads to possible computations 
of the distribution of the maximum W ~ maxXi or sum 
S ~ J2-^i^ of '^he components of X at- Such results are 
of premier importance for the use of such models in sta- 
tistical physics applications as well as in signal processing 
for problems involving statistical properties of extremes or 
time-averages as ensemble average estimators. This is being 
investigated. To finish with, the potential use of this synthesis 
tool to generate independent copies of sets of hyper-parameters 



in Monte Carlo Markov Chain numerical schemes constitutes 
a natural track to investigate. 
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